Concept
scalable computing
Parents
Children
Cloud ComputingData RetrievalDew ComputingDistributed Data StorageDistributed Programming
1.9K
Publications
120K
Citations
5.8K
Authors
1.2K
Institutions
Fault-Tolerant Data-Parallel Clusters
2002 - 2008
During 2002–2008, research patterns consolidated scalable data processing across large clusters and grids by unifying data-parallel programming models, distributed scheduling, and data locality in map-reduce–style workflows and web-scale architectures. Patterns of reliability and observability matured into core design goals, enabling high-speed cluster monitoring, fault-tolerant execution, and highly available web services across wide-area networks. Efforts to optimize inter-node communication and acceleration advanced fast collective operations, NIC-based reductions, and scalable web server accelerators, while teams defined quantitative scalability metrics and engineering practices to measure and improve distributed systems.
• Unified patterns for scalable data processing across large clusters and grids, integrating data-parallel programming models, distributed scheduling, and data locality in map-reduce–style workflows and web-scale architectures [1], [5], [6], [14], [16].
• Patterns of reliability and observability that enable scalable systems: high-speed cluster monitoring, fault-tolerant execution, wide-area monitoring, and highly available web services [3], [8], [9], [11], [13].
• Patterns optimizing inter-node communication and acceleration for large clusters, including fast collective operations, NIC-based reductions, and scalable web server accelerators [12], [15], [2], [13].
• Quantitative and engineering patterns for measuring, defining, and improving scalability, including scalability metrics, execution-time tradeoffs, software-engineering considerations, and malleability/migratability in distributed apps [18], [19], [17], [7].
Unified Cluster Resource Management
2009 - 2021